Search CORE

31 research outputs found

Scheduling Dynamic Parallelism On Accelerators

Author: Benjamin Rose
Costin Iancu
Dimitrios S. Nikolopoulos
Filip Blagojević
Katherine Yelick
Matthew Curtis-maury
Publication venue
Publication date: 01/01/2009
Field of study

Resource management on accelerator based systems is complicated by the disjoint nature of the main CPU and accelerator, which involves separate memory hierarhcies, different degrees of parallelism, and relatively high cost of communicating between them. For applications with irregular parallelism, where work is dynamically created based on other computations, the accelerators may both consume and produce work. To maintain load balance, the accelerators hand work back to the CPU to be scheduled. In this paper we consider multiple approaches for such scheduling problems and use the Cell BE system to demonstrate the different schedulers and the trade-offs between them. Our evaluation is done with both microbenchmarks and two bioinformatics applications (PBPI and RAxML). Our baseline approach uses a standard Linux scheduler on the CPU, possibly with more than one process per CPU. We then consider the addition of cooperative scheduling to the Linux kernel and a user-level work-stealing approach. The two cooperative approaches are able to decrease SPE idle time, by 30 % and 70%, respectively, relative to the baseline scheduler. In both cases we believe the changes required to application level codes, e.g., a program written with MPI processes that use accelerator based compute nodes, is reasonable, although the kernel level approach provides more generality and ease of implementation, but often less performance than work stealing approach

CiteSeerX

Crossref

Energy-Efficient Multithreading through Run-Time Adaptation

Author: Curtis-Maury Matthew
Nikolopoulos Dimitrios S.
Publication venue: Chapman & Hall: CRC Computational Science
Publication date: 01/07/2014
Field of study

Queen's University Belfast Research Portal

VT-ASOS: Holistic System Software Customization for Many Cores

Author: Dimitrios S. Nikolopoulos
Godmar Back
Jyotirmaya Tripathi
Matthew Curtis-maury
Publication venue
Publication date: 03/11/2009
Field of study

VT-ASOS is a framework for holistic and continuous customization of system software on HPC systems. The framework leverages paravirtualization technology. VT-ASOS extends the Xen hypervisor with interfaces, mechanisms, and policies for supporting application-specific resource management schemes on many-core systems, while retaining the advantages of virtualization, including protection, performance isolation, and fault tolerance. We outline the VT-ASOS framework and present results from a preliminary prototype, which enables static customization of scheduler parameters and runtime adaptation of parallel virtual machines. 1

CiteSeerX

Crossref

Online Power-Performance Adaptation of Multithreaded Programs using Hardware Event-Based Prediction

Author: Christos D. Antonopoulos
Dimitrios S. Nikolopoulos
James Dzierwa
Matthew Curtis-maury
Publication venue
Publication date: 01/01/2006
Field of study

With high-end systems featuring multicore/multithreaded processors and high component density, power-aware high-performance multithreading libraries become a critical element of the system software stack. Online power and performance adaptation of multithreaded code from within user-level runtime libraries is a relatively new and unexplored area of research. We present a user-level library framework for nearly optimal online adaptation of multithreaded codes for low-power, high-performance execution. Our framework operates by regulating concurrency and changing the processors/threads configuration as the program executes. It is innovative in that it uses fast, runtime performance prediction derived from hardware event-driven profiling, to select thread granularities that achieve nearly optimal energy-efficiency points. The use of predictors substantially reduces the runtime cost of granularity control and program adaptation. Our framework achieves performance and ED 2 (energy-delay-squared) levels which are: i) comparable to or better than those of oracle-derived offline predictors; ii) significantly better than those of online predictors using exhaustive or localized linear search. The complete prediction and adaptation framework is implemented on a real multi-SMT system with Intel Hyperthreaded processors and embeds adaptation capabilities in OpenMP programs

CiteSeerX

Crossref